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INTRODUCTION . 


In this paper we present several results aimed at a solution to the feature 
selection problem. While we do not claim to have obtained a "better solution" 
to the feature selection problem than that presently used, we believe our results 
are potentially applicable and, we hope, will offer more insight into the feature 
selection problem. 


SECTION 1 - REDUCTION OF THE NUMBER OF VARIABLES IN "BEST B'S". 


The technique presently used to solve the feature selection problem (that is, 

determine a k x n matrix A which maximizes the average interclass divergence 

D for the k-case) essentially treats each of the k • n entries of A as a 
k 

variable which must be determined. Here, by using the fact that D = D. for 

■ ir A A 

any k * k invertible matrix P, we make a few simple observations to show that 
the number of variables in A which must be determined can always be taken to be 
£ k{ (n-k) + (k+l)/2] and perhaps, from a probabilistic point of view, can be assumed 
to be s k[n-k]. For the algebraic definitions and results which we make use of 
here, we refer the reader to D. Finkbeiner [2; Chapter 6]. 

We begin with two definitions. 

DEFINITION t Let A and B be SL x m matrices. We say that A is row equivalent 
to B if and only if there exists an invertible l * % matrix P such that 
A = PB. 


2 



DEFINITION . We say the £ x tn matrix E is in reduced echelon form 


If and only if the following three conditions are satisfied: 

(i) The first p rows of E are nonzero; the other rows are zero« 

(where P = rank E) * 

(ii) The first nonzero element in each nonzero row is 1 and It appears 
in a column to the right of the first nonzero element of any preceding row. 

(iii) The first nonzero element in each nonzero row is the only nonzero 
element in Its column. 

Thus, if E = (e^) a ^ x ® matrix, £ £ m, and if E is in reduced 

echelon form, then E is "upper triangular", where by "upper triangular" we mean 

that e^ ■ 0 if i > j, j “ 1* £-1. Observe that if £ = m, then E is 

invertible If and only if E ~ I . 

m 

THEOREM 1 . Any £ x m matrix T of rank p is row equivalent to a 

£ x m matrix In .reduced echelon form with p nonzero rows. In particular, 

if £ » m « p (that Is, T Is an invertible m x m matrix), then the reduced 

echelon matrix to which T is row equivalent is I . 

m 

Proof : See [2; p. 124], 

In particular, let B - (B^ B 2 ) be a k x n matrix, k < n, where B 1 

and B£ are k x k and k x (n- 1$ matrices, respectively. Suppose further 

that rank S' ■ k. By Theorem 1, B^ is row equivalent to a k x k matrix 
in reduced echelon form. Thus, there exists a k x k invertible ;iatrix P such 
that = PB^ and in particular, C = PB = (PB^ PB 2 ) = (t^, CJ , where is 

a k x k matrix in reduced echelon form. Since is in reduced echelon form, 

</ 
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then is upper triangular, and hence, has at most 1 + 2 + ... + k 

" (l/2)k(k+l) nonzero entries (in fact, if has (l/2)k(k+l) nonzero 

entries, then all diagonal elements of are nonzero and hence is 

invertible. Thus, = 1^ and has exactly k nonzero entries). 

Therefore, unless k = 1, we can say that has < (l/2)k(k+l) nonzero 

entries. 

In general, even though B = (B^, B^) has rank k, and hence the row rank 
of B = the column rank of B = k, it is not necessarily true that the first k 
columns of B are linearly independent, and hence B^ need not be invertible. 
However, if B^ is invertible, then by Theorem 1, 0 =1^ and hence 

C =» (1^, C 2 ) has k(n-k) "unknown" elements, namely the elements of C 2 « 

Using the fact that the average interclass divergence for the k-case is 
invariant for row equivalent matrices (that is, Dp^ 15 D^, where B is a k x n 
matrix and P is a k x k invertible matrix) , we have proven the following 
theorem. 

THEOREM 2 <> Let k and n be positive integers, 1 < k < n. There exists a 

"best C" which maximizes the average interclass divergence D for the k-case 

B k 

of the form: C = (C^ C 2 ), where is a k x k upper triangular matrix in 

reduced echelon form and C 2 is a k x (n-ld matrix. Moreover, if there exists 
a "best B" (for the k-case) of the form B = (B^, B 2 ), where B 1 is a k x k 
invertible matrix, then we may take = 1^, and hence there are at most k • (n-k) 
unknown entries in C - namely, the entries of C 2 « 

With regard to the assumption that is invertible, examples have been 

given to show that there exist "best B's" for which B 1 is not invertible. 
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However, because of the example given in [1], this does not guarantee that 
there does not exist a M best B” for which is invertible. Indeed, from 

a probabilistic point of view, it would seem reasonable to assume that is 

invertible. We state two questions: 

o 

Question 1 : Let k and n be positive integers, 1 < k < n. Does there 

exist a k * n matrix B = (B.^, B 2 ) » where B^ and Bj are k * k and k * (n-k) 

matrices, respectively, which maximizes Dg for the k-case and for which B^ 

k 

is invertible? 


Question 2 : Givdn e > 0, does there exist a k x n matrix B = (B^, B^) , 

where B, is a k x k invertible matrix, for which |D„ - D ! < e, where D 
1 B k B k 

denotes the maxi mum value of the average interclass divergence for the k-case? 


Finally, we observe that Homer Walker has recently shown that if U is 
any n x n unitary matrix and k is any positive integer, 1 £ k < n, then 
there exist i £ min {k, n-k} Householder matrices H^, »»», such that 

D (I t IZ)U ■ D (I s IZ)R t ...H 1 " In particular. If V U such that 

is the maximum value of divergence for the k-case, then Walker's result effects 

a reduction in the number of unknowns to be determined - from k»n (the entries 

of (I k |Z)U) to n*£ t since each Householder matrix is completely determined 

T 

by a column n-vector £ of norm 1, (In fact, if H = I - 2££ is a Householder 

(Zl\ 


matrix, where £ = 


t 


, q + ... 


+ « 1, then q = + yfl - ( 5 * + ... + ^)» 


and thus, to absolute value, H is determined by n-1 variables, and it may be 
argued that Walker's result reduces the number of unknowns to (n-l)Jl.) 
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However, if D is the maximum value of divergence for the k-case, and 
if B = (I k IZ)U = (B 1> B 2 ), where is a k x k invertible matrix, then we 

have shown that we may take B^ = 1^, and hence the number of unknowns in B 
is k(n-k). Thus, since k(n-k) < (n-l)Z always (where l = min {k, n-k}), 
and since k(n— k) * (n-T)Jl if and only if k - 1 or k * n— 1, the result presented 
here effects a further reduction in the number of unknowns, if 1 < k < n-1. 

(In actual practice, if one were to make, use of Walker’s result in attempting to 
determine a "best B", it appears that it would be necessary to take l = min {k,n-k}.) 


SECTION 2 - ITERATIVE SELECTION OF H 'S. 


It has been established that there exists a k * n matrix B of the form 

(1^1 Z)U, where U is an n x n unitary matrix, which maximizes the average 

interclass divergence D for the k-case. Moreover, as indicated in Section 1, 

B k 

it has been shown that there exist H £ min {k,n-k} Householder matrices 
H, , ...» H„ for which D, r „ is a maximum. 

Recently, it has been proposed that one might obtain a "best B" by successively 
maximizing over the set of n x n Householder matrices the function: 


D (I |Z)X H H 1 ^ ^ ^ # is the set of real numbers), 

where X is a variable with domain and, for each j = 1, .... i, 6 
is such that D,_ „ is the maximum value of the function: 

2 J Hj • « • 


a k iz)x 


H H k ! ^ ' 



A more detailed development of this, and in particular, a proof of the fact 

that D, | . actually obtains a maximum on will appear elsewhere. 

Several questions have been raised in conjunction with this suggestion and 

most still remain unanswered. In this brief section we present two observations 

and ask five questions related to this proposal. 

We first note that the n x n Householder matrices are precisely those 

T 

matrices of the form: I - / where £ is a column n-unit vector. In 

n 

particular, every Householder matrix H is a symmetric unitary matrix and hence 
H = H Of prime importance, Henry Decell has shown that: 


D (I k IZ)H p ...H 1 S D (I k IZ>VlV"V 

where, for each 1=1, . .., p + 1, H. denotes a Householder matrix which 

maximizes (over ) D, T ._ w „ „ . For our first observation, we make 

(. 1^1 

use of the fact that is invariant under the group of inner automorphisms of 
the group ■it of unitary matrices. In particular, H H^ ^ H H^ e 4 4 . 

whenever H^, H e 


PROPOSITION,, 


Let G^, . . . , e (That is, G^, ..., G^ are fixed but 


arbitrary Householder matrices.) Let G^^ maximize the function: 
D (I k IZ)X 0 r ..G 1 •■$'*<%*■ 


Then, for any G e 


D (I k IZ)G r ..G 1 G SD (I k IZ)G k+1 G r ..G 1 . 
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Henry Decell has pointed out to the authors the following generalization 


of the Proposition. 

RESULT . Under the same hypothesis as for the Proposition, if G is any 
element of , then, for any p = 0, 1, . .., Jt-1, 


D(I k ,z)G r** G jt- P GG Mp+ir"* G i D(I k ,z)G Wv* # G 1 * 

Proof : Observe that G^... ^5,-p * * * -He# and hence, 


D (I k IZ)HG r ..G 1 S D (I k l2)G, +A ...Cj. 


But = HG^ G £_p G ^_(p+ 1 ) * G X = 

[G JT * ‘ G £_£ G £- p * (G £‘ * - G Jl_p > (G £-(p+l) • • • V = 

G £*** G J!,-p G &» (pf 1) * * * G i * since (G i-p”’ G Jt )(G £** ,G £_p ) = T n * 

As a consequence of the Proposition or Result, observe that if, for each 


i > 0, denotes a Householder matrix which maximizes the function 


(IJZ)X H. 




„ X e <+jk , then, for any p > 1, 

« ♦ • JA 9 9 


D (I. |Z)H . . .H ~ D (I. I Z)T -H ...H 0 
k p 1 k p+1 p 2 


< D 




s D 


(I k |Z)T 2p-Vl 


t 



where e for each jew, and Tp+j maximizes the function 

», T 1 71X T T H H » x € lf J < P» < If 3 * P. then no H i' s 

{I k |Z)x Vj-l*"Vl p” 3 

appear and the form of the function maximized is clear.) 

This points out the known fact that the H^’s are not unique, and perhaps 
Indicates that one must be "selective" in choosing the H^’s which maximize at 
each step. 

We list five questions below; the first one seems to continually arise in 
any attempt to show that a "best B" can be reached by this successive maximization 
over the set H. 


Question 1: If D 


where H, G, e 


|Z)H * D (I k IZ)G» 13 D (I k |Z)H 1 H 5 D (I k |Z)H 1 G» 


li Let H^, ... ,H , G^, ... ,G. e <|s|* and suppose that 


> (I k |Z)G j ...G 1 5 D (I k |Z)H j ...H 1 


. Let G. , , and H.., maximize D 


(I k tZ)XG j ...G 1 


*** D (I k |Z)X reSpeCtiVely * IS D (I k |Z)G j+1 G j ...G 1 " D (I k IZ)H j+1 H j ...H 1 ? 

Question 3: If the answer to Question 2 is "No, in general", can we choose 


G, ,...,G. and so that D 

1 J 1 3 


(I k IZ)G^...G^ D (I k IZ)G i ...G 1 and 


D 1 1 = D 

(I k |Z)H i# ..H^, for each 1=1, ...,j, and for which there exist 


G j+l and Vl »Wch maximize | Z)X and |Z)X Hj \ . , H j, respectively, 

and are such that | Z)g ' +i g’ . . * B (I k |Z)H- H’ .. .Hj? 

Question 4: Replace s in Question 1 by ■. That is, let H, G e 

and suppose that » (Ifc | Z)H * D ( Ik | Z)G - Is IzlHjH = IZJHjG’ " here H 1 
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represents an arbitrary element of^^T ? 

Question 5 ; In Question 4> replace € 4 - with maximizes 
°(I k IZ)XH and G 1 maXl " 1ZeS D (I k IZ)XG- IS D (I k l2)H 1 H * “(^IZJG^ 


SECTION 3 - A REPRESENTATION THEOREM USING H^S. 


As observed in Section 1, Homer Walker has recently shown that if 
B ■ (1^1 Z)U, where U is an n x n unitary matrix, then there exists l £ min{k,n-k) 
Householder matrices such that |Z)H^. . .H * P art ^ cu ^ ar » 

this fact follows directly from the following two results. 

(i) There exists 1 i v £ k Householder matrices such that 

(I k IZ)U = (I k IZ)G v ...G 1 . 

(ii) There exists 1 < j S n-k Householder matrices T^,...,!^ such that 

(I |Z)U = A(I. |Z)T . . . ,T- , where A is a k x k invertible matrix, 

k 1c j i 

In particular, if k = n-1, then n-k = 1 and thus, by (ii) , there exists 

a Householder matrix H and an invertible (n-1) * (n-1) matrix A such that 

(I |Z)U = A(I |Z)H. In this section we make repeated use of this result, 
n-1 n-1 

together with the fact that the nth row of H can be chosen equal to the nth 
row of U, to derive a result which has application in attempting to maximize 
divergence by a procedure closely related to that suggested in Section 2. 

Throughout this section and will denote the sets of Jl x £ unitary 

and Householder matrices, respectively. Observe that 4K -'Ml for all positive 


JO 



integers Z. The main result of this section is the following theorem. 


THEOREM 3. Let p and n be fixed positive integers, p < n, and let 


A A 

Then there exist p Householder matrices H 1 ,***,H having the 


1 1J 

following property (*) : given any integer Z, H l i p, there exists € tt n _^ 
such that *' 


< I u-l |z)U * c t <I «-i ,z>B t*" H r 


Moreover, for each j = 1, p, H 


j 


[5. 

z ■ 

[z 

I. . 

J- 


where Hj € 


and Z denotes the zero matrix of appropriate dimension* Therefore, for any 
integer Z, 1 £ Z <i p, D (I ^ (z)u = 

Before proving Theorem 3, we derive two preliminary results. 


LEMMA 1. Let p and n be positive integers p < n, let A be a p 


matrix, and let A 


- hh • 

\ n-p/ 

appropriate dimension. Then the following are true. 

(i) A s if and only if A e 

(ii) A e if and only if A e 


where Z denotes the zero matrix of 


Moreover, A(I p IZ pX(n _ p) > - <I p I Z pX(n . p) )A. 


Proof; Observe that (A) = 


■I 


A A 

and therefore, k* (A) 


a // a 

Thus, (i) A £ < > A* 

A 


n-p/ 


have: A e 

k 

5 - 

M 


n 

<- 


AT T 

(A) = I < > AA = I 

n p 


< > A e . 


- (- 


For (ii) 


A 

■> A = I - 2££ for a unit n vector 


n 


< ■> A = 1^ - 2^|^, where £ 



is a unit p-vector. 



we 
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The last statement of Lemma 1 follows directly from matrix multiplication. 


Observe that if k e w and if He ^ e ^k’ fc ^ en ^ ^ (UH)H, 

where UH e Moreover, this representation is unique; that is, if U AH, 

then A = UH e 


LEMMA 


_2. For any positive integer k, let H e 4k ' u € where the 


kth 


row of H is the kth row of U. If A is a (k-l)x(k-l) matrix such that 
O' - ■ 

• i ~ i ii rn on u c xk 0 


L k-1 



k-1 


| j U, then A e c 



Thus, by the observation preceding Lemma 2, A = UH e an d therefore, by Lemma 

1. A e # k _ r 


Proof of Theorem 3 . 

Fix n > 1 and let U e "t/ • We proceed by induction on p. If p = 1, then 

A . 

by result (ii) there exists an (n-l) x (n-l) invertible matrix and H^ € n"”n 

0 

such that , | \ \ U = C, , | } H^. Moreover, the nth row of ^ can 




be chosen to be the same as the nth row of U, and hence, by Lemma 2, 



For the sake of clarity we will go through the case for p = 2„ By result 

/ yo\ 

(ii), applied to 1 * ] C i» there exists an (n-2)x(n-2) invertible matrix 

' J f |«\ f o\ 

C 2 " 2 £ Qhn-l su<:h that Pn-2 J ! K ' C 2^n-2 \Yz’ ” here the ( "‘ 1>st 

row of is chosen equal to the (n— l)st row of C^. Thus, by Lemma 2, 2 * 

Therefore, 




Suppose now that the theorem is true for p; we show it is true for p+1. 
(That is, suppose we have constructed p n x n Householder matrices 
having property (*) . We construct H p+1 , having the desired form, and such that 

(I n-(p+l) |Z)U = C p4a (I n-(p+l)’ Z ^p+l^p 009 ^1* Where C P+1 £ ^/n-(p+l).) 
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and, for each 


Let (I n-J Z)U = C p (I n-, IZ> * P — V ” here °P £ “n-P 


il. 


i = l,...,p, H ± has property (*). Then by result (ii) , applied to 

^n-Cp+l) | * j V and by Lefflma 2 there eX±St Vi € ^n-(pfl) 311 


H 


[ , , £ AL such that 

p+1 tT n -p 
p+1 


if 


•( 


^n- (p+1) 


H p+1 

z 

z 

^-(p+l) 


! j C p C p+1 ^ I n- (p+1) 

then ^ e <#* n and 


Vr Thus > 


<V (I H-1) |Z)U 


^h-(p+l) i j ^n- 


p |Z (n-p)xp> U 


(' 


n- (p+1) |: I C p (I n-p |Z (n-p)xp )ft p K 

0 


- C 


p+1 In- (p+1) 


= C 


L^n-i 


p+lln-Kp+l) 


°\ , A A 

: I H . - (I Z, W )H ...H, 

: I p+1 v n-p’ (n-p) xp P 1 

0 I 

1 ) (I |Z, w )fl .-t ... i. 

. I n-p 1 (n-p)xp / p+1 p 1 

0 / 


C ,,(I , .,JZ)fi .A ... i,. 

p+1 n-(p+l) P+1 p 1 


Thus, by induction, the proof is complete. 

As already remarked in Section 2, it has recently been proposed that one 
might obtain a "best B" by successively maximizing over the set 4fn the 
average inter class divergence 

D (l k iZ)XH 1 ... H l S 4/n * 


J4 



whei'e X e (See Section 2 for a full statement of the proposed procedure*) 


n 


Another procedure, closely related to the preceding, is to successively maximize 

over the set t£L the function: 
rv n~p 


Vi ^ n " p ^ 


where - I n _ (p+1) 


iX [ I 


n-p 


* lH p ... n 




0 

* 

0 


lH r 


and, for each i = 1, . .., p, e n -(i i) maximizes the function 


V ! ^n-(i-l) ■*" 

D4 

Note that the problem of obtaining Hp^ € 4^n p maximizes the above 

function is a "new feature selection problem". In particular, the problem of 

maximizing over the set . the function: 

1 n-1 




”(■-•! D , ND V# ~ 1 *^ 

0, 

is the new feature selection problem with new statistics (l , I l ft, H ( ^ 

V n-1 | \J 1 i IV 0...0 ✓ 

0 

and ( v, | ; ) 1 = 1, ..., m, as "new covariance matrices 


and means". 


respectively (where ft-,...,ft and p, , ...,p denote the covariance matrices and 

i in 1 id 

means, respectively, for the m-classes of the original problem). More details on this 
procedure will appear elsewhere. 
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Now, 




°\ 

« j is a (n-p)xn matrix and hence 

V 


is a particular value for the average interclass divergence for the case 

k = n-p; thus ^ , where denotes the maximum value of the average 

n-p n-p 

interclass divergence for the case k = n-p. Is it possible that D = D ? 

B B 


n-p 

Less hopefully, can we determine ( a U e IL such that = D, T , * ? 

xi B (I | Z } U 

n-p 

Theorem 3, or more precisely, the proof used in Theorem 3, provides an 
affirmative answer to our last question. In particular, the proof of Theorem 3 shows 
that 



SECTION 4 - A CONCLUDING OBSERVATION 

We conclude the paper with an observation and two questions. Homer Walker 
has pointed out to us that if U is any n x n matrix, then 
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D, . - D /T . . . particular, this is true if U is unitary and if 

(i k iz;u U k+1 l z > u 

D (I |Z)U t * ie max: *- mum va ^- ue f° r Dg» B £ ■ {A | A is a k x n matrix with 

k 

real entries}. 

Using this fact, observe that if 


X lz)D k ' X |Z > 

[-1® 

'' and 

D( W z) Vi ' D <W Z) 

r \ 

r (k+l) 

U 1 

e 


i (k) 

l n J 



1 (k+l) 
u 

l n J 


are maximum values for the case k and k + 1, respectively (where u denotes 

the ith row of U.), then we have 
J 


X Z > D W-1 ' 

• 


' (k+1) 


J 


~ D d k lz)u k = (k)' 


Vi 


“k 


(k) 


^w z) \ = D r u < k) ^ 


n 


(k) 


Vi 


(k) 


< D 


^k+l^Vl = l (k+l) N 


“l 


\ 


(k+l) 


W+l 


(k+1) 
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This motivates the following question 


Let D (Ij |Z)Uj “ 9^ (J)> 


(j) 


(j) 

J 


be the maximum value for D , B € M. , 


B’ 


where u^ VJ/ denotes the ith row of U^, an n * n unitary matrix, for each 
J 1, » f * i n, i = 1, j • Let 


(„ ( 1 ) ^ 


C k = 


u„ 


( 2 ) 




(k) 


J 


for each k = 1, . n. Does D. = D, 

C k U k |Z;U k 


for any k - 1, n? If not, do there exist n * n unitary matrices 

for which the answer is yes? 


jn' 


,U. 


n 


Finally, we close with a somewhat unrelated (to what we have discussed in this 

paper) question. We have already observed that there existsan n * n Householder 

matrix H such that D, *- s the maximum value for D , B e M , . For 

I n B n-l,n 

k < n-1, we do not know if there always existsan n * n Householder matrix 
for which IZ)H^ * s t * ie max i mum value of D R , B^ e M^. (In fact, the 

evidence seems to indicate the contrary, in general). However, we propose the 
following question. 


Question 2 : Suppose |z)h^ is the maximum value for D g » e M^, where 


H^ is an n * n Householder matrix and k < n-1. Is D 
maximum value for D_ , B. e M. , for any k < 1 £ n-1? 

B j 3 J n J 


(I j |Z) \ 


the 


IS 
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